Data

The original data is stored in the file lxb-wz2c-m46p (1).xlsx. After excluding the first 100 rows, a random sample of 100 entries was selected from the remaining data with seed 2025. Based on the human brain and ground truth, a summary of text comment of original data was extracted, recorded in total 13 features Document.ID, Tone, Tone_justify, Tone_quote, Commenter_Role, Specialty, Patient_cost, Patient_quality, Provider_pay, Provider_quality, Other, Justification, Quote, Role_category,word_count,Concern_Level,State/Province, Provider_logistics,Provider_pay_decrease,has_substantive_tone into a CSV file named fill_sample.csv. it contains 18 discrete variables and 1 continous variables with 100 rows (totaling 1,900 cells). Among them, 16.2% of the cells are empty,meaning no useful information could be extracted.

Missing value

Correlation between Tone and Tone_justify

Tone Quote

below are the examples of Tone Quote - FTC-2024-0022-0951: Very Negative > “It is WRONG and destructive to quality of care.”,“PLEASE DO NOT LET OUT OF CONTROL GREED DESTROY MEDICAL CARE IN THIS COUNTRY!!”,“Medicine should focus on CARE, not SHAREHOLDER OR OWNER PROFIT MAKING!!!”

Commenter Roles Explanation and Exploration

Commenter Role with Specialties and Total Count
Commenter_Role Count Specialties
attorney 2 -
caregiver 3 -
cilvil_servant 2 -
nurse 6 register
organization 4 labor_union, ngo, coalition
patient 20 millitary veteran, kidney transplant recipient, CFO
pharmacist 3 retired
physician 12 emergency, anesthesiologist, terminated, rheumatology
provider 6 emergency
psychotherapist 3 emergency, terminated

[1] “physician” “nurse” “provider” “pharmacist”
[5] “psychotherapist”

Examples of Comments with Hard-to-Identify Commenter Roles

These comments lack clear identifying information, making it difficult to classify the commenter roles. So, most of them are labeled as NA in Commenter_role


Correlation Between Commenter Role and Tone

Association Between Commenter Role and Tone

  • Chi-square test indicates that Commenter_Role is statistically significantly associated with the Tone of the comment (\(p<0.05\)).

  • The Contingency Table Heatmap shows counts for each combination of Role ("patient", "provider", "other", "no information") and Tone category ("negative", "very negative", "neutral", "positive").

Distribution of Tone per Role Category

The histogram showing the distribution of tone within each role category reveals the following:

  • For commenters with no information on Role, contributions are relatively highest in the negative and very negative tones. Combining this with the longitudinal interpretation—that those directly experiencing PE tend to show negative or very negative tones—this suggests that many commenters without role information may actually belong to the patient or provider groups if forced to classify within patient, provider, or other.

  • For providers and patients, both groups are contributed more by negative or very negative tones compared to neutral or positive tones. Moreover, providers are contributed by more to the negative tone than the very negative tone compared to patients. This may be due to providers’ professional training and occupational discipline, which help them better regulate emotional expression when describing issues.

  • For Other, contribution from positive tone is the least, while contributions from negative, very negative, and neutral tones show no marked imbalance. This suggests that people who do not directly experience PE tend not to hold supportive attitudes toward PE, indicating that PE’s impact is broad and its societal effect is primarily negative.

Exploration of comment’s length

word_count features the comment text length. For the comments recorded in the attached files, I count the words manually. For the comments recorded in the original data cell, I count the words through function str_count() in R.

Overall Summary Word Count
count missing min q1 median mean q3 max sd
100 0 1 46.25 125.5 333.41 226.25 6297 806.8054


Summary Word Count by Tone
Tone count min q1 median mean q3 max sd
negative 44 1 77.25 142 387.7727 271.75 6297 1002.44962
neutral 8 3 3.00 12 37.8750 21.25 227 76.89592
positve 1 3 3.00 3 3.0000 3.00 3 NA
very_negative 47 3 68.00 130 339.8511 173.50 3724 663.49463
Kruskal-Wallis Test for Word Count by Tone
Statistic Df P_value
13.693 3 0.003354
Dunn's Test for Word Count by Tone
Comparison Z P.unadj P.adj
negative - neutral 3.3397435 0.000838558 0.005031
negative - positve 1.7401757 0.081828165 0.490969
neutral - positve 0.4489645 0.653457281 1.000000
negative - very_negative 0.7190723 0.472096352 1.000000
neutral - very_negative -2.9618691 0.003057778 0.018347
positve - very_negative -1.5921502 0.111350954 0.668106

Summary Word Count by Role_category
Role_category count min q1 median mean q3 max sd
other 11 1 3.0 6.0 260.4545 206.00 1741 521.6973
patient 20 3 138.5 178.5 258.7000 264.25 1199 262.9307
provider 30 3 85.5 135.0 305.1000 283.50 2720 507.3077
NA 39 3 31.5 78.0 414.0769 132.50 6297 1175.3598
Kruskal-Wallis Test for Word Count by Role_category
Statistic Df P_value
4.455 2 0.1078
Dunn's Test for Word Count by Role_category
Comparison Z P.unadj P.adj
other - patient -2.1099855 0.0348596 0.1046
other - provider -1.4864119 0.1371702 0.4115
patient - provider 0.9287684 0.3530091 1.0000

Summary of Word Count by Concern Level
Concern_Level count min q1 median mean q3 max sd
No Concern 9 3 5.00 13.0 21.11111 25.00 61 20.43554
Low 16 19 35.75 109.0 196.00000 144.75 1741 415.31129
Medium 37 3 66.00 129.0 514.37838 271.00 6297 1220.97753
High 30 66 119.75 155.5 315.33333 270.25 1805 415.47350
Highest 8 1 3.00 4.5 190.37500 297.25 756 295.83873
Kruskal-Wallis Test for Word Count by Concern_Level
Statistic Df P_value
23.376 4 0.0001065
Dunn's Test for Word Count by Concern_Level
Comparison Z P.unadj P.adj
High - Highest 2.4754416 0.0133071529 0.1330715
High - Low 2.1517089 0.0314202915 0.3142029
Highest - Low -0.7364734 0.4614426671 1.0000000
High - Medium 1.2797534 0.2006318712 1.0000000
Highest - Medium -1.7198786 0.0854545159 0.8545452
Low - Medium -1.1753905 0.2398385037 1.0000000
High - No Concern 4.4166745 0.0000100231 0.0001002
Highest - No Concern 1.4273936 0.1534664769 1.0000000
Low - No Concern 2.4299783 0.0150997288 0.1509973
Medium - No Concern 3.6704134 0.0002421585 0.0024216

Correlation between State and Concern Level

Pearson’s Chi-squared test

data: contingency_matrix X-squared = 120.64, df = 108, p-value = 0.1912

#write.csv(df,"100_entries_sample.csv")
#sample_50=read.csv("100_entries_sample.csv")
#sample_50=sample_50[1:50,]
#sample_50
#write.csv(sample_50,"50_out_100_sample.csv")